NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GPU-Accelerated Simulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems

https://doi.org/10.1145/3716368.3735247

Gonul, Yilmaz Ege; Kayan, Ceyhun Efe; Mustafazade, Ilknur; Kandasamy, Nagarajan; Taskin, Baris (June 2025, ACM)

Full Text Available
Clustering and Allocation of Spiking Neural Networks on Crossbar-Based Neuromorphic Architecture

https://doi.org/10.1145/3649153.3649199

Mustafazade, Ilknur; Kandasamy, Nagarajan; Das, Anup (May 2024, ACM)

Neuromorphic hardware, designed to mimic the neural structure of the human brain, offers an energy-efficient platform for implementing machine-learning models in the form of Spiking Neural Networks (SNNs). Achieving efficient SNN execution on this hardware requires careful consideration of various objectives, such as optimizing utilization of individual neuromorphic cores and minimizing inter-core communication. Unlike previous approaches that overlooked the architecture of the neuromorphic core when clustering the SNN into smaller networks, our approach uses architecture-aware algorithms to ensure that the resulting clusters can be effectively mapped to the core. We base our approach on a crossbar architecture for each neuromorphic core. We start with a basic architecture where neurons can only be mapped to the columns of the crossbar. Our technique partitions the SNN into clusters of neurons and synapses, ensuring that each cluster fits within the crossbar's confines, and when multiple clusters are allocated to a single crossbar, we maximize resource utilization by efficiently reusing crossbar resources. We then expand this technique to accommodate an enhanced architecture that allows neurons to be mapped not only to the crossbar's columns but also to its rows, with the aim of further optimizing utilization. To evaluate the performance of these techniques, assuming a multi-core neuromorphic architecture, we assess factors such as the number of crossbars used and the average crossbar utilization. Our evaluation includes both synthetically generated SNNs and spiking versions of well-known machine-learning models: LeNet, AlexNet, DenseNet, and ResNet. We also investigate how the structure of the SNN impacts solution quality and discuss approaches to improve it.
more » « less
Full Text Available
WaferCap: Open Classification of Wafer Map Patterns using Deep Capsule Network

https://doi.org/10.1109/VTS60656.2024.10538764

Mishra, Abhishek; Shaik, Mohammad Ershad; Lingamoorthy, Anush; Kumar, Suman; Das, Anup; Kandasamy, Nagarajan; Touba, Nur A (April 2024, IEEE)

In integrated circuit design, analysis of wafer map patterns is critical to enhance yield and detect manufacturing issues. With the emergence of novel wafer map patterns, there is increasing need for robust artificial intelligence models that can both accurately classify seen patterns and while also detecting ones not seen during training, a capability known as open world classification. We develop a novel solution to this problem: WaferCap, a Deep Capsule Network designed for wafer map pattern classification and equipped with a rejection mechanism. When evaluated using the WM-811k dataset, WaferCap significantly surpasses existing methods, achieving 99\% accuracy for fully seen patterns while demonstrating robust performance in open-world settings by effectively detecting unseen wafer map patterns.
more » « less
Full Text Available
Neuromorphic Computing for the Masses

https://doi.org/10.1109/ICONS62911.2024.00014

Matinizadeh, Shadi; Mohammadhassani, Arghavan; Pacik-Nelson, Noah; Polykretis, Ioannis; Tishbi, Krupa; Kumar, Suman; Varshika, M L; Mishra, Abhishek Kumar; Kandasamy, Nagarajan; Shackleford, James; et al (July 2024, IEEE)

Full Text Available
Online Performance Monitoring of Neuromorphic Computing Systems

https://doi.org/10.1109/ETS56758.2023.10173860

Mishra, Abhishek Kumar; Das, Anup; Kandasamy, Nagarajan (May 2023, IEEE)

Neuromorphic computation is based on spike trains in which the location and frequency of spikes occurring within the network guide the execution. This paper develops a frame-work to monitor the correctness of a neuromorphic program’s execution using model-based redundancy in which a software-based monitor compares discrepancies between the behavior of neurons mapped to hardware and that predicted by a corresponding mathematical model in real time. Our approach reduces the hardware overhead needed to support the monitoring infrastructure and minimizes intrusion on the executing application. Fault-injection experiments utilizing CARLSim, a high-fidelity SNN simulator, show that the framework achieves high fault coverage using parsimonious models which can operate with low computational overhead in real time.
more » « less
Full Text Available
Hardware-Software Co-Design for On-Chip Learning in AI Systems

https://doi.org/10.1145/3566097.3568359

Varshika, M. L.; Mishra, Abhishek Kumar; Kandasamy, Nagarajan; Das, Anup (January 2023, Asia and South Pacific Design Automation Conference)
Built-In Functional Testing of Analog In-Memory Accelerators for Deep Neural Networks

https://doi.org/10.3390/electronics11162592

Mishra , Abhishek Kumar; Das, Anup Kumar; Kandasamy, Nagarajan (August 2022, Electronics)

The paper develops a methodology for the online built-in self-testing of deep neural network (DNN) accelerators to validate the correct operation with respect to their functional specifications. The DNN of interest is realized in the hardware to perform in-memory computing using non-volatile memory cells as computational units. Assuming a functional fault model, we develop methods to generate pseudorandom and structured test patterns to detect hardware faults. We also develop a test-sequencing strategy that combines these different classes of tests to achieve high fault coverage. The testing methodology is applied to a broad class of DNNs trained to classify images from the MNIST, Fashion-MNIST, and CIFAR-10 datasets. The goal is to expose hardware faults which may lead to the incorrect classification of images. We achieve an average fault coverage of 94% for these different architectures, some of which are large and complex.
more » « less
Full Text Available
Design-Technology Co-Optimization for NVM-based Neuromorphic Processing Elements

https://doi.org/10.1145/3524068

Song, Shihao; Balaji, Adarsha; Das, Anup; Kandasamy, Nagarajan (March 2022, ACM Transactions on Embedded Computing Systems)

An emerging use-case of machine learning (ML) is to train a model on a high-performance system and deploy the trained model on energy-constrained embedded systems. Neuromorphic hardware platforms, which operate on principles of the biological brain, can significantly lower the energy overhead of a machine learning inference task, making these platforms an attractive solution for embedded ML systems. We present a design-technology tradeoff analysis to implement such inference tasks on the processing elements (PEs) of a Non-Volatile Memory (NVM)-based neuromorphic hardware. Through detailed circuit-level simulations at scaled process technology nodes, we show the negative impact of technology scaling on the information-processing latency, which impacts the quality-of-service (QoS) of an embedded ML system. At a finer granularity, the latency inside a PE depends on 1) the delay introduced by parasitic components on its current paths, and 2) the varying delay to sense different resistance states of its NVM cells. Based on these two observations, we make the following three contributions. First, on the technology front, we propose an optimization scheme where the NVM resistance state that takes the longest time to sense is set on current paths having the least delay, and vice versa, reducing the average PE latency, which improves the QoS. Second, on the architecture front, we introduce isolation transistors within each PE to partition it into regions that can be individually power-gated, reducing both latency and energy. Finally, on the system-software front, we propose a mechanism to leverage the proposed technological and architectural enhancements when implementing a machine-learning inference task on neuromorphic PEs of the hardware. Evaluations with a recent neuromorphic hardware architecture show that our proposed design-technology co-optimization approach improves both performance and energy efficiency of machine-learning inference tasks without incurring high cost-per-bit.
more » « less
Full Text Available
DFSynthesizer: Dataflow-based Synthesis of Spiking Neural Networks to Neuromorphic Hardware

https://doi.org/10.1145/3479156

Song, Shihao; Chong, Harry; Balaji, Adarsha; Das, Anup; Shackleford, James; Kandasamy, Nagarajan (May 2022, ACM Transactions on Embedded Computing Systems)

Spiking Neural Networks (SNNs) are an emerging computation model that uses event-driven activation and bio-inspired learning algorithms. SNN-based machine learning programs are typically executed on tile-based neuromorphic hardware platforms, where each tile consists of a computation unit called a crossbar, which maps neurons and synapses of the program. However, synthesizing such programs on an off-the-shelf neuromorphic hardware is challenging. This is because of the inherent resource and latency limitations of the hardware, which impact both model performance, e.g., accuracy, and hardware performance, e.g., throughput. We propose DFSynthesizer, an end-to-end framework for synthesizing SNN-based machine learning programs to neuromorphic hardware. The proposed framework works in four steps. First, it analyzes a machine learning program and generates SNN workload using representative data. Second, it partitions the SNN workload and generates clusters that fit on crossbars of the target neuromorphic hardware. Third, it exploits the rich semantics of the Synchronous Dataflow Graph (SDFG) to represent a clustered SNN program, allowing for performance analysis in terms of key hardware constraints such as number of crossbars, dimension of each crossbar, buffer space on tiles, and tile communication bandwidth. Finally, it uses a novel scheduling algorithm to execute clusters on crossbars of the hardware, guaranteeing hardware performance. We evaluate DFSynthesizer with 10 commonly used machine learning programs. Our results demonstrate that DFSynthesizer provides a much tighter performance guarantee compared to current mapping approaches.
more » « less
Full Text Available
A Design Flow for Mapping Spiking Neural Networks to Many-Core Neuromorphic Hardware

https://doi.org/10.1109/ICCAD51958.2021.9643500

Song, Shihao; Varshika, M. Lakshmi; Das, Anup; Kandasamy, Nagarajan (November 2021, 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD))

Full Text Available

« Prev Next »

Search for: All records